Planning with Macro-Actions in Decentralized POMDPs Citation
نویسندگان
چکیده
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macroactions: temporally extended actions which may require different amounts of time to execute. We model macroactions as options in a factored Dec-POMDP model, focusing on options which depend only on information available to an individual agent while executing. This enables us to model systems where coordination decisions only occur at the level of deciding which macro-actions to execute, and the macro-actions themselves can then be executed to completion. The core technical difficulty when using options in a Dec-POMDP is that the options chosen by the agents no longer terminate at the same time. We present extensions of two leading Dec-POMDP algorithms for generating a policy with options and discuss the resulting form of optimality. Our results show that these algorithms retain agent coordination while allowing near-optimal solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods.
منابع مشابه
Planning with macro-actions in decentralized POMDPs
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macroactions: temporally extended actions which may requ...
متن کاملLearning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments
Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general framework for multiagent sequential decision-making under uncertainty. Although Dec-POMDPs are typically intractable to solve for real-world problems, recent research on macro-actions (i.e., temporally-extended actions) has significantly increased the size of problems that can be solved. However, current...
متن کاملLearning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments
This paper presents a probabilistic framework for learning decentralized control policies for cooperative multiagent systems operating in a large partially observable stochastic environment based on batch data (trajectories). In decentralized domains, because of communication limitations, the agents cannot share their entire belief states, so execution must proceed based on local information. D...
متن کاملMonte Carlo Value Iteration with Macro-Actions
POMDP planning faces two major computational challenges: large state spaces and long planning horizons. The recently introduced Monte Carlo Value Iteration (MCVI) can tackle POMDPs with very large discrete state spaces or continuous state spaces, but its performance degrades when faced with long planning horizons. This paper presents Macro-MCVI, which extends MCVI by exploiting macro-actions fo...
متن کاملExpectation Maximization for Average Reward Decentralized POMDPs
Planning for multiple agents under uncertainty is often based on decentralized partially observable Markov decision processes (DecPOMDPs), but current methods must de-emphasize long-term effects of actions by a discount factor. In tasks like wireless networking, agents are evaluated by average performance over time, both short and longterm effects of actions are crucial, and discounting based s...
متن کامل